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Abstract 

The influence of node mobility on the convergence time of averaging gossip algorithms in networks 
is studied. It is shown that a small number of fully mobile nodes can yield a significant decrease in 
convergence time. A method is developed for deriving lower bounds on the convergence time by merging 
nodes according to their mobility pattern. This method is used to show that if the agents have one- 
dimensional mobility in the same direction the convergence time is improved by at most a constant. 
Upper bounds are obtained on the convergence time using techniques from the theory of Markov chains 
and show that simple models of mobility can dramatically accelerate gossip as long as the mobility 
paths significantly overlap. Simulations show that these bounds are still valid for more general mobility 
models that seem analytically intractable, and further illustrate that different mobility patterns can have 
significantly different effects on the convergence of distributed algorithms. 

1 Introduction 

Gossip algorithms are distributed message passing schemes that are used to disseminate and process in- 
formation in networks. Average consensus [1-3] and averaging gossip algorithms [4, 5] form an important 
special case of schemes that can compute linear functions of the data in a robust and distributed way. Such 
schemes have found numerous uses for distributed estimation, localization and optimization [6-8] and also 
for compressive sensing of sensor measurements and field estimation [9]. In this paper we study gossip al- 
gorithms that compute linear functions and will not discuss related problems like information dissemination 
(see e.g. [10,11] and references therein). 

Gossip algorithms are a natural fit for wireless ad-hoc and sensor network applications because of their 
distributed and robust nature. Recently the broadcast nature of wireless communication has been exploited 
to improve convergence [12]. Another key feature of some wireless networks is node mobility; to the best of 
our knowledge, the impact of mobility on gossip algorithms has not been significantly investigated. In this 
paper we attempt to analyze how mobility can (or cannot) help the convergence of gossip algorithms. For 
fixed nodes in a random geometric graph or grid (both popular model topologies for large wireless ad-hoc 
and sensor networks), standard gossip is extremely wasteful in terms of communication requirements; even 
optimized standard gossip algorithms on a grid converge very slowly, requiring 6(n^ loge~^) messages [5,13] 
to compute the average within accuracy e. Observe that this is of the same order as requiring every node 
to flood its estimate to all other nodes. The obvious solution of averaging numbers on a spanning tree 
and flooding back the average to all the nodes requires only 0(n) messages. Clearly, constructing and 
maintaining a spanning tree in dynamic and ad-hoc networks introduces significant overhead and complexity, 
but a quadratic number of messages is a high price to pay for fault tolerance. In this context, what kind of 
mobility patterns are beneficial and how many mobile agents are needed to boost the convergence speed? 
Our results suggest that certain kinds of mobility can, in some cases, significantly accelerate convergence. 
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Our results are a first step to understanding iiow mobility can impact the convergence of iterative message- 
passing schemes, at least for the special case of pairwise averaging where the convergence behavior is better 
understood. 

Main Results: Our first result is that if m nodes have full mobility and the others are fixed, the 
convergence time drops to 0(n^/m log e~^). Therefore, even a vanishingly small fraction of mobile nodes 
can change the order of messages required for convergence. In particular, if any constant fraction of nodes 
have full mobility, the convergence time drops to 6(nloge~^), the same order as a fully connected graph. 

Our second result is that some mobility patterns might not be beneficial. We show that even if all the 
nodes of the network have one dimensional mobility in the same direction (e.g. horizontal), this yields no 
benefit in the convergence time, up to constants. Intuitively, this is because the information must still diffuse 
across the other direction (e.g. vertical). Finally we show that one dimensional mobility with a randomly 
selected direction is as good as full mobility. 

In order to obtain these results, we develop a novel method for deriving lower bounds on the convergence 
time of gossip algorithms with mobile nodes by merging nodes with similar mobility regions. This method 
is based on a characterization of the convergence time of Markov chains in terms of a functional called 
the Dirichlet form [14]. Our upper bounds are derived using the so-called Poincare inequality [15] and the 
related canonical path method [16]; a version of this technique has also been previously used to study gossip 
algorithms [17]. 

2 Network model and preliminaries 

2.1 Time model 

We use the asynchronous time model [5,18], which is well-matched to the distributed nature of wireless 
networks. In particular, we assume that each sensor has an independent clock whose "ticks" are distributed 
as a rate A Poisson process. Our analysis is based on measuring time in terms of the number of ticks of an 
equivalent single virtual global clock ticking according to a rate nX Poisson process. An exact analysis of 
the time model can be found in [5]. We will refer to the time between two consecutive clock ticks as one 
timeslot. 

Throughout this paper we will be analyzing the number of required messages without worrying about 
delay. We can therefore adjust the length of the timeslot s relative to the communication time so that only 
one packet exists in the network at each timeslot with high probability. Note that this assumption is made 
only for analytical convenience; in a practical implementation, several packets might co-exist in the network, 
but the associated issues are beyond the scope of this work. 

2.2 Network and mobility model 

Suppose we have a collection of n agents A. At the first timeslot, each agent i starts at some initial location 
with a scalar Xi{0). We will denote the vector of their initial values by x(0). The objective of our algorithm 
is for every agent to estimate the average 



In order to accomplish this goal, the agents pass messages between each other to communicate their estimates. 
We assume that this communication always succeed^]. We also assume that the messages involve real 
numbers; the effects of message quantization in gossip and consensus algorithms is an active area of research 



The n agents can move in an area Q. For example, we may take 5 to be a graph with vertex set V and 
edge set £. Agents at locations v and can communicate if either v = or iy^v') G £. Another example 

■"^Note however that gossip algorithms remain robust to communication and agent failures. 




(1) 



[19,20]. 
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is taking Q to be the unit square and allowing agents at v and v' to communicate if the distance d{v^v') is 
less than some radius r(n). 

In this paper we will use two main examples. The first is the ^/n x ^/n discrete lattice on the torus. 
We will assume that the agents start at time at different sites on the torus. The second example is the 
random geometric graph (RGG) model on the unit torufl In this model, the agent's initial locations are 
generated by selecting n uniformly chosen locations on the unit square. Agents can communicate with each 

other if the distance between them is less than r(n) = 5c^^^", where c > 10 ensures some useful regularity 
properties [17] discussed subsequently. 

Under agent-based mobility, at each time step agent i moves to a new location in Q chosen according to a 
fixed probability distribution jii. Therefore the sequence of agent locations li{l),l2{t)^ . . . ^li{t) is independent 
and identically distributed (iid) according to the distribution /i^. We call the collection of distributions 
{fii : i e A} dLii agent-based mobility pattern. Our theoretical results in this paper are for agent-based 
mobility. Some examples of agent-based mobility are: 

1. A simple example of agent-based mobility is full uniform mobility on the whole graph. In this model, 
jiii is the uniform distribution on Q for each i G A. This corresponds to the case where each agent 
is equiprobably at any location in the graph at time t. This is similar to the model proposed by 
Grossglauser and Tse [21]. We will also consider a static network with m fully mobile agents added to 
the network. 

2. In the horizontal mobility model, each agent selects a new horizontal location uniformly at each time. 
For the torus, the agent selects a new column uniformly. For the RGG, it selects a new horizontal 
coordinate uniformly from [0, 1]. 

3. In the bidirectional model each agent selects equiprobably whether it will move horizontally or vertically 
for all time. At each time step, the horizontal agents select a new horizontal coordinate uniformly, and 
the vertical agents select a new vertical coordinate uniformly. 

4. In a local model for the torus, an agent that starts initially at location (z, j) chooses a new location 
uniformly in the square of size (2m + 1)^ centered at (i, j). That is, the horizontal coordinate is 
uniformly distributed in — m, . . . , i + m}( mod ^/n) and the vertical coordinate is chosen uniformly 
in {j — m, . . . , j + m}( mod \/n). 

The key assumption in all our mobility models is that in each gossip timeslot, the positions of the mobile 
agents are selected independently from some distribution supported on a sub-region of the space, similarly to 
Grossglauser and Tse [21]. Popular mobility models like the random walk model [22,23], random waypoint 
model [24], and random direction model [25] have time dependencies. If however the duration of one gossip 
timeslot is comparable or larger than the mixing time of the mobility model, the positions of the agents will 
be approximately independent (see also [26]). If delay is not an issue, we can always set the duration of 
the gossip timeslot to have that property, and in simulations we show that if we do not allow the mobility 
model to mix, mobility is not as helpful. Therefore, our preliminary experimental evaluation suggests that 
our analytic results could be used to bound these more realistic mobility models. 

3 Algorithm and main results 
3.1 The algorithm 

The gossip algorithm that we will consider is a simple extension of the standard nearest-neighbor gossip 
of Boyd et al. [5] that includes the mobility model in a natural way. At each time step, the agents move 
independently to new locations. One agent is selected at random, chooses one of its neighbors according to 
the graph G, and performs a pairwise average with that neighbor. More precisely, at each time t = 1,2,... 
the following events occur: 

^The unit torus is formed from the unit square by "glueing" opposite edges together. 
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1. Each agent i E A chooses a new location li{t) according to the mobihty distribution /i^. 

2. A agent i is selected at random and selects a neighbor j uniformly from the set 

mt) = {k€A:{li{t),lk{t))€V}. (2) 

3. The agents i and j exchange values and update their estimates: 



H^i{t-1)+Xj{t-1)) k = i,j 
^ ~ 1 - 1) kj^ij 



(3) 



Since the algorithm is randomized, we are interested in providing probabilistic bounds on its running 
time. Given e > 0, the e-averaging time [5] is the earliest time at which the vector x(t) is e close to the 
normalized true average with probability greater than 1 — e: 



' ' x(t) - xl 




= sup inf . ^ , ,, , - ,, 

,(0^t=0,l,2... ] \ ||x(0)|| 



>e\<e}, (4) 



where ||-|| denotes the £2 norm. Note that this is essentially measuring a rate of convergence in probability. 
The analysis of Denantes et al. [27] shows that bounds on the spectral gap yield an asymptotic deterministic 
rate of vanishing error. Our bounds can be used to bound both the rate of convergence in probability and 
to show that the averaging error decays exponentially asymptotically almost surely. 

3.2 Main results 

Our main results characterize the benefit (or lack thereof) of mobility in speeding up the convergence of 
gossip algorithms: 

• For horizontal mobility on the random geometric graph and the torus, the averaging time improves by 
at best a constant factor over the case where the agents are not mobile at all: 



j.(RGG,horiz)/ e)-nf n^^OS^ ^ 



(6) 



• For bidirectional mobility where each agent initially selects whether to move vertically or horizontally, 
the convergence time is within a constant factor of full mobility: 

Ti:r^-bi)(n,e) = 0(n log e-i) (7) 

Ti^^''^^Hn,e) = 0{n\oge-') (8) 

• For n non-mobile agents on a y/n x y/n torus with m < n agents having full mobility, the convergence 
time is 

^^torus plus m,2D) ^) _ Q ^^g e"^^ . (9) 

• For the local mobility model with each agent moving in a square of size (2m + 1)^, 

y(torus,local) ^) ^ ^ j^I^!^ ^-1^ (jq) 
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4 Upper and lower bounds on convergence time 



4.1 Convergence analysis 

At each step of the algorithm, the agents update their estimates of the average x. Let x(t) denote the average 
estimates at time t. For agents i and j define the matrix VF^*'-^) 

W^(iJ)=7_ l(e,-e,.)(ei-e,-f , (11) 

where is the i-th elementary vector. The new vector of averages is given by 

x(t) = W^(^'^')x(t - 1) . (12) 

The randomness in the mobility and in the agent selection induces a probability distribution on the matrices 
: i, j G v4}. Since the mobility and selection are iid across time, we can write the update as 

^{t)=(j[W{s)^^{0), (13) 

where {W^(s)} are iid random matrices. Denote the expected value of this random matrix by 1^ = E[iy(5)]. 
It is not hard to see that 14^ is a (symmetric) stochastic matrix and therefore corresponds to a Markov chain. 
Let Pij be the probability that agent i is selected in step 2 of the algorithm and it selects agent j in its 
neighbor set. Then it is clear that ¥{W{s) = VF^^'^)) = P^j + P^-^, and that 



Wij = -{Pij^Pji) . (14) 



The pioneering work of Boyd et al. [5] showed that the convergence time of a randomized gossip algorithm 
is dictated by the mixing time of the Markov chain associated to W. Mathematically, our problem is how to 
analyze the mixing time of the new graph induced by the new feature (in this case mobility) and then compare 
it to the old graph without mobility. For a Markov chain A4 with transition matrix the convergence rate 
to the stationary distribution is given by X2{W)^ the second largest eigenvalue of W. Note that the largest 
eigenvalue Ai(I^) is 1. Define the relaxation time T^eiax to be the reciprocal of the spectral gap: 

^'•'•-<'^> = CTo ^ <'^' 

Theorem 1 (Convergence with Treiax [5]). If P = {Pij) is symmetric and n is sufficiently large, then 
^ave(^, e) is bounded by 

T^vein, e) = e (Treiax(V^) loge-^) (16) 



4.2 Lower bounds 

In this section we turn to a general method for constructing lower bounds on the convergence time for 
pairwise gossip algorithms under agent-based mobility. The main intuition is to partition the set of vertices 
in the graph and merge all agents whose mobility is supported in the same element of the partition. This 
induces a transformation on the Markov chain associated to the gossip algorithm. By using an extremal 
characterization of the relaxation time for Markov chains we can lower bound the Treiax (W^) in the original 
gossip algorithm by that for the induced Markov chain. The only remaining issue is to choose a partition 
that yields a tight lower bound. At the moment, this must be done by inspection, but we can use this 
technique to show that horizontal mobility cannot improve the convergence of gossip for the torus or the 
RGG. 
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Theorem 2. Let {Ur} he any partition of the set of locations Q, and let W he the transition matrix of the 
chain induced hy merging all agents whose mohility is restricted to a single set in the partition. Then 

) = n{TreiaAW)loge-') . (17) 

Proof. We begin with the set Q on which the agents in A can move. Let {Ur : r = 1, 2, . . . , M} be a partition 
of Q. Given an agent-based mobihty pattern {/ii}, let 

Cr = {v e A : l^y{Ur) = 1} , (18) 

be the set of agents whose mobility is restricted to Ur. We can create a map F on the state set A of the 
Markov chain corresponding to the gossip algorithm: 



'^^^^ I a otherwise ^^^^ 



The map F merges agents whose mobility is restricted to Ur and leaves the other agents invariant. Let B 
denote the image of F. For a Markov chain on A with transition probabilities Wij and stationary distribution 
TT, we can define a new Markov chain on B with transitions Wki- 



F{i)=k ^.pf^^^^^j^j.pf^j^^^i 



This is the induced chain from the function F [14, Chapter 4, p. 37]. The stationary distribution of this chain 

is TTk = Y.r.F{i) = k^i- 

We can express the relaxation time of a Markov chain in terms of the Dirichlet form [14]. Given a 
real-valued function g on the state space of the Markov chain with transition matrix W and stationary 
distribution 7r(-), the Dirichlet form is given by 

^9, 9) = \Y. '^(^)^fc' ^a{k) - 9{l)? ■ (21) 

k,l 

The relaxation time is then given by 

T,eiax(T^) = «7 1^^^^^ Y.<^)9{k) = o| . (22) 

The following contraction principle shows that Treiax for an induced chain is at most that of the original 
chain. The validity of this lemma is mentioned in [14, Chapter 4, p. 37] and here we present a proof which 
easily follows from similar arguments from [14]: 

Lemma 1. Let M he a Markov chain on a finite state space A with transition matrix W and let F : A ^ B 
he an arhitrary mapping. Then the relaxation time of the chain M on B with transition matrix W given hy 
( f^) induced hy F lower hounds the relaxation time of the original chain: 

Trelax(W^) < Trelax(W^) • (23) 

Proof. We use the extremal property of the relaxation time in ([22]) . Let g achieve the supremum in ([22]) for 
the induced chain given by W. We can create a function g from g to lower bound Treiax(A^)- Simply set 
g{i) = g{k) for {i : F{i) = k}. Then 

^^(z)^(z)2 = ^^(fc)^(fc)^ (24) 

ieA keB 

Furthermore, from ([20]) we can see that the Dirichlet form V{g, g) is also unchanged. Therefore the supremum 
of ([22]) for the original chain is at least as large as that for the induced chain. ■ 
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Note that while the mixing time of a Markov chain decreases when states are merged, as argued, the 
same is not true for other quantities hke the expected time to go from one state to another. The preceding 
lemma and Theorem [1] gives a lower bound on the benefit on the convergence speed of gossip in a network 
of mobile nodes. ■ 

In theory we could optimize the lower bound over all partitions {Ur}^ but for our examples there is an 
"obvious" partition that yields a meaningful lower bound. We turn first to the ^/n x ^/n torus. 

Corollary 1 (Torus with horizontal mobility). Let G = (V,f) be the \fn x ^Jn torus and suppose that 
the set of agents A = V. Let the mobility pattern for the {i^j)-th agent be uniformly distributed on the set 
Ui{{i^k) : k < V^}; which corresponds to mobility only in the horizontal direction. Then 

Tave(n,e) =l](n2loge-i) . (25) 



Proof. Consider the partition {Ui} of V, where Ui is the i-th row of the torus. Consider two agents, one 
starting at {i^j) and the other at (/c, /), where k = i^l. Then the probability in the algorithm that (i, j) and 
(k^l) average with each other is the chance that (i, j) is selected times the probability (over the mobility) 
that (i, j) and (A:,/) are adjacent to each other times the chance that (i, j) selects (/c,/) out of its neighbors. 
We can upper bound this probability: 

The chain induced from this partition is a cycle with ^/n states, where each state corresponds to a row 
in the original Markov chain. The transitions from row to row are given by ([2Q|) : 

Wki=^^ J2 E (27) 

= V^-Vn-V^---o(-x^] (28) 
n \n -y/n / 

.0(i), (29) 

Therefore the self-transition for each state is 1 — 0(l/n). Let a = Wkh the transitions from row to row. The 
matrix W is circulant and generated by the vector (a, 1 — 2a, a, 0, . . . , 0). The eigenvalues are given by the 
discrete Fourier transform of the vector (c.f. [13]): 

Xk{W) = 1 - 2a + 2a cos (^ ^""^^ • (30) 

In particular, the second-largest eigenvalue can be bounded using the Taylor expansion of the cosine: 

/ 1 47r2 \ /I 
A2(W) > l-2a + 2a M - j =1-0 

Therefore the relaxation time is 

Trelax = ^{u^) , (31) 

and the averaging time is bounded by Theorem [H ■ 

The preceding theorem shows that allowing nodes to move in only one direction gives the same order 
convergence time as the the torus without any node mobility. That is, sometimes mobility can yield no 
significant benefits in terms of convergence. In the case where we add a single agent moving in the vertical 
direction we still do not gain anything. The proof follows from the same arguments as Corollary [H 



7 



Corollary 2 (A single vertical mover doesn't help). Let G = (V,f ) be the y/n x y/n torus and suppose that 
the set of agents A = V U {e}. Let the mobility pattern for the {i,j)-th agent in V be uniformly distributed 
on the set {{i^k) : k < y^}; which corresponds to mobility only in the horizontal direction. Let the mobility 
pattern for e be uniform on {(i, 1) : i G V^}- Then for this gossip algorithm, 

rave(n,e)=n(n2loge-i) . (32) 

We could prove in a similar way that adding a constant number of agents in the vertical direction does not 
speed up the convergence appreciably Our final result in this section shows that ID unidirectional mobility 
cannot help speed up the convergence time of gossip on random geometric graphs as well. Boyd et al. [5] 
have shown that the averaging time for standard pairwise gossip on the RGG is 6(nr~^ loge~^), which for 
r(n) = 6(v^n~^ logn) is 6((n^/ logn) loge~^). 

4.3 Upper bounds 

Canonical Path method [16]: For any ergodic and reversible Markov chain on a state space for each 
pair i, j of states define the capacity of a directed edge Cij to be 

C{e) = 7r{i)Wij . (33) 

For each pair of states we define a demand D{i,j) = 7r(z)7r(j). A flow is any way of routing D{i^j) units of 
liquid from i to j for all pairs i, j simultaneously Formally a fiow F : V ^ is a function on the set V of 
all simple paths on the transition graph of the Markov chain that satisfies the demand: 

J2 F{p) = D{tJ) , (34) 

where Vij denotes all the paths from i to j. 

For a fiow F we can define the load on an edge e to be total fiow routed across that edge: 

/(e) = J2 (35) 

The cost of a fiow F is the maximum overload of any edge: 

p(F)=max^, (36) 

Finally define the length of a fiow /(/) to be longest fiow-carrying path, i.e. the longest p for which F{p) ^ 0. 

Using these definitions, we can use the following Poincare inequality [16] to yield an upper bound on the 
inverse spectral gap (relaxation time) of the Markov chain: 

^ <p(F)/(F). (37) 



1-\2{W) 



Intuitively, if there are no 'bottlenecks' on the transitions for every pair of states, the relaxation time of 
the chain will be very small. Any fiow F gives an upper bound that depends on the cost p{F) of its most 
congested edge. 

Corollary 3 (Full mobility is optimal). Let G = (V,^) be the ^Jn x ^Jn torus and suppose that the set 
of agents A = V. Let the mobility pattern every agent in V be uniformly distributed on all of V, which 
corresponds to full mobility. Then for this gossip algorithm, 

Tave(n,e) = l](nloge-M . (38) 
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Proof. The stationary distribution is uniform, so tt^ = 1/n for all i and the demand D{i,j) = for all 
pairs j). Furthermore, the probability of i and j averaging is (^(l/n^), so the state diagram of the Markov 
chain is the complete graph with edge capacities Q{l/n^). The simplest flow is to route directly the demand 
on the edge from i to j, which gives a cost of 0{n) with a flow of length 1, so the relaxation time is 
0(n). ■ 

A slightly less simple example is a cycle with one fully mobile agent. The cycle has averaging time 
6(n^ loge_i) (see [13]). With one mobile agent the averaging time drops to 0(n^ loge_i) 

Corollary 4 (Cycle with one fully mobile agent). Let G = (V,f) be a cycle of n locations and suppose the 
agents are ^ = V U {v'}. Suppose no agent in V can move, but v' has mobility uniformly distributed on V, 
which corresponds to full mobility. Then for this gossip algorithm, 

Tave(n,e) =l](n2log6-i) . (39) 

Proof. The stationary distribution for this chain is uniform, so tt^ = l/(n + 1) for all i in A. The probability 
that i and j average for i, j G ^ is unless i and j are neighbors. Otherwise, with probability ^ the mobile 
node v' is a neighbor of i, so: 

1 f f. 3\ 1 3 1\ 1/^1 





3 








h - • 
n 




2n V 



p. . 

^■^ n \\ n I 'Z n '6 I 'Zn \ n 



For i e A and j = v' we have 



Thus the capacity is 



1 3 1 _ 1 

n n 3 



C{i,j)={ 2n(n+l)V- n, (43) 



2(n+l) 



2-=v 



The demand is just D{i,j) = l/(n + 1)^ between each pair of nodes. 

To construct a flow F, we just route all flow through the mobile agent v' . An edge {i^v') for z G V carries 
n flows to all agents j ^ i, each of size l/(n + 1)^ for a total of f{i,v') = n/{n -\- 1)^. Similarly, an edge 

i) the same total flow. All flows are of length 2, so 1{F) = 2. The overload is 



l/(n2(n + l)) (n + 1) * 

And thus for large n we get an upper bound of 0{rfi) for the relaxation time of the chain. The averaging 
time then follows from Theorem [H ■ 



5 Examples revisited 

We now turn to our examples of mobility and derive scaling results for gossip with mobility. For the torus 
we will show that local mobility in a square of area m? cuts the convergence time by m? and adding m fully 
mobile agents cuts the convergence time by m. For the random geometric graph we will prove the same 
result for bidirectional mobility and a lower bound for unidirectional mobility. 
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Figure 1: Routing flow in the local mobility model. 



5.1 The torus 
5.1.1 Local mobility 

An important step in bridging the mobility model here with more reasonable mobility models is to consider 
local mobility, in which an agent moves uniformly in a square of side length (2m + 1) centered at its initial 
location. 

Theorem 3. Consider gossip with n agents on the ^Jn x y/n torus Q. Each agent moves uniformly in a 
square of side-length 2m + 1 centered at itself Then the averaging time is given by 



log m 



loge ^ 



(41) 



Proof. Divide the grid into squares of side length m. Then each square contains m? agents and mobility of 
each agent spans at least its own square and each square adjacent to it. For each pair of agents we must 
route D{i^j) — Xjv? units of flow. We will do this by routing flows in L-shaped paths, as shown in Figure 
[TJ Agent i will spread its Xjv? units of flow evenly to the vn? nodes in the adjacent square - these will 
them send the flow split among the rv? agents in the next square, and so on. Thus flow is routed only along 
inter-square edges. Each left-to-right edge carries flow from the 0(y^/m) squares to the left of it and to 
the 0{n/m?) squares to the right and above it. Each square has m? agents so there are 0(v?l^ jm) flows 
carrying per flow, so the load on the edge is 



f{h3)=0 



(42) 



The same bound holds for down-to- up edges. 

To find the capacity of these edges, we calculate the probabihty that agents i and k in adjacent squares 
average with each other. The probability is 1/n to select agent i and the overlap in agent i and /c's mobility 
area is Q.{m?)^ so the chance i and k are adjacent after moving is Vt{l/m?). With high probability there will 
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be no more than O(logm) nodes for i to choose from, so the chance of selecting k is at worst ^(l/logm). 
Thus: 



C{i,k)=n(^^ . (43) 



The maximum length of any flow is 0{y/n/m), so the Poincare inequality gives 



1 ^ / nMogrn^ ^^^^ 



l-X2{W) 



5.1.2 Adding mobile agents 

The question motivating this work is this : how much can agent mobility improve the convergence speed 
of gossip or consensus algorithms? Put another way, how much mobility is needed to gain a certain factor 
improvement in the convergence? A simple model for which we can answer this question is the following: 
consider n static agents in the y/n x y/n torus together with m mobile agents whose mobility fii is uniform 
on the torus. We use our techniques from earlier sections below to show that the averaging time of gossip 
in this model is 0(n^/mlog e~^), which for m = is Q{n^~'^). For example, adding y/n mobile nodes can 
speed convergence by a factor of y/n. 

Theorem 4. Consider the gossip with n -\- m agents on the ^Jn x ^/n torus Q. The n static agents S are 
positioned on the n nodes of the torus and the m mobile agents M have mobility that is uniform on Q, where 
m <n. Then the averaging time is given by 



Tave(n,e)=e — loge-i 
\ m 



(45) 



Proof We first show that for i G 5 and j G Al , the probability Pij that agent i contacts agent j and averages 
is G(l/n(m + n)). Agent i is selected with probability l/(m + n) and agent j is in the neighborhood of agent 
i with probability 5/n. Therefore: 



p - ^ 

-1-0,1 — 



m—l ^ 



n{m + n) ^ 5 + / 

where L is the the number of agents in M. that land in the neighborhood of i. The summation is just 

m — l ^ 

P(i = Z) = E[l/(5 + i)], (47) 

1=0 



Pij = O ( , \ , ) • (48) 



which is clearly upper bounded by 1, so 

P.. = n ( 

^n{m + n) 

Since 1/(5 + L) is convex, Jensen's inequality can be used to lower bound 

E[l/(5 + L)] > 1/E[5 + L] = 1/(5 + m/n). (49) 
Therefore Pij = ^{l/n{m + n)). By symmetry, we have the same bound on Pji. 
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To get the lower bound, consider the function G : SUM ^ SU{M} that is the identity on S and merges 
j\4 into a single state M. We can bound the transition probabilities of the new chain using ([2Q|) : 

= ( ^ \ m 



M 



n{m + n) ^ 



\n{m^n)J ^ ^ 

For i^k G S we have W^/c = W^/c. 

The new chain is a torus plus an additional central node M. The probability of transitioning from the 
torus to the central node is &{{m/ n) / {m -\- n)) and for transitioning back it is 6((l/n)/(m + n)). It can be 
seen (see the Appendix) that the relaxation time for this chain is Q.(in? /m) via the extremal characterization 

in ([221). Thus Tave(n,e) = ^ (^loge-^) . 

We now turn to the upper bound. As before, we construct a flow on the chain. The demand between 
any two agents (i, j) is l/(n + m)^. Since Pij = 9(l/n(n + m)), the capacity 

C{e) = 6(l/n(n + m)2), 

for e = (i, j). We must now construct a flow that will yield an upper bound on the relaxation time of n? /m. 
For a pair of states i ^ S and j G we assign l/(n + m)^ to the direct path (i, j). For a pair i ^ S and 
j e S we split l/(n + m)^ equally into the m paths (i, k^j) for k G M. Finally, for z G and j e MU S 
we again route l/(n + m)^ directly on (i, j). Then 

f{{hj)) = { iJeS 



(m+n) 



Therefore p{F) = 0(n^/m). Since all paths are of the same length, the Poincare inequality implies that 
Treia^W) = O (n Vm) , so Theorem □ gives Tave(n, e) = O loge-^^ 



5.2 Random geometric graphs 
5.2.1 Bidirectional mobility 

We now turn to the case where some agents move horizontally and some vertically. We will prove our results 
for the random geometric graph model, where n nodes are initially placed uniformly in the unit square 
Q. In the bidirectional mobility model, before the gossip algorithm starts, each node flips a fair coin, is 
assigned to move horizontally or vertically, and moves like this throughout the process. Note that this is 
a one-dimensional mobility model since each node is moving only horizontally or vertically throughout the 
execution of the gossip algorithm, never changing direction. Our result is that this mobility model is as good 
as complete node connectivity. 

Theorem 5. Consider the gossip algorithm with n agents under the random geometric graph model and 
bidirectional mobility. We can choose a connectivity radius r{n) = Q (^\/~^^^^ such that the the gossip 
averaging time is 

Tave(n,e) = 6(nloge-^). (52) 
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Figure 2: Random geometric graph example with bidirectional ID mobility. 



Proof. We start by partitioning the space into a grid of squares of size Ci . Let Bi denote the number of 
agents whose initial position was in square i. 

It is well known [17,28-30] that a combination of a Chernoff and a union bound, yields uniform bounds 
on the maximum and minimum load of all the squares: 

P f— logn < Bi< 2cilogn Vi) > 1 - n^"^^/^ — - — 



2 ^ ci log n 

Therefore, selecting ci > 10 we can show that all the squares have O(logn) agents with probability at least 
1 — ^2 ^ ' This guarantees balanced square loads even if the experiment is repeated times. We set the 

transmission radius to r(n) = y^5ci^^^^ to guarantee that a agent in a square can always communicate with 
agents in the four adjacent squares. 

Recall that initially each agent is assigned to be a horizontally moving or vertically moving node by 
flipping a coin and keeps this directionality throughout the process. Denote by Hi the set of nodes that 
move horizontally and whose initial position was in the i-th row of squares. These agents always stay in the 
i-th row. Similarly, let Vi be the set of agents who move vertically in the i-th column of squares. 



Each square contains in expectation ci logn nodes and there are y -^^-j^^ squares in each row and column. 

Since each node flips a fair coin and is assigned in a vertically or horizontally moving class, the expected 
cardinalities will be: 



E\H,\ = E\V,\ = ici logn./— ^ = e{^/^^^^) . (53) 
2 y ci log n 

Using standard Chernoff bounds we can show that the cardinalities of are sharply concentrated 

near their expectation. 

Theorem [1] shows that the averaging time of the gossip algorithm is bounded by the inverse spectral 
gap (relaxation time) of the average matrix where the expected matrix W = EW{s) is computed over 
mobility of the nodes and random selection of which nodes are gossiping. 

We now proceed to bound the spectral gap using a canonical flow and we need to select paths for every 
pair of states for the Markov chain defined by W. The state space is the set of n agents and 7r{i) = 1/n for 
each agent i since W is doubly stochastic. The capacities of the edges will be proportional to the entries of 
W (see[T4|). where Wij is the average of the probabilities Pij and Pji, measuring how often agents i and j are 
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Figure 3: Routing flow from a node in Hi to a node in set H^. 



pairwise averaged. For each pair of agents (i, j) we must specify how to satisfy the demand D{i^j) = 
by assigning flows to some (appropriately chosen) paths in Vij . 

Our flow construction uses four different cases depending on whether i and j move horizontally or 
vertically: 

Case 1 Suppose i ^ and j e Hi. To satisfy the demand node i assigns 6(n~^) units to each path 
{i^v^j)^ where v e Vr for some r. There are 6(n) agents who move vertically, so the total flow that reaches 
j can be made equal to n~^. See Figure [3l 

Case 2 Suppose i G Vk and j G VJ. This is the same as the previous case, except that Q{n~^) units are 
assigned to each path (i, h^j) for h G Hr. 

Case 3 Suppose i e Hk and j G Vi. To satisfy the demand assign to the direct path (i, j). 
Case 4 Suppose i G Vfc and j G Hi. We again assign to the direct path (i, j). 

Our construction therefore only uses the edges in the graph between H sets and V sets. In other words 
it is only the averaging between nodes that move vertically with nodes that move horizontally that allows 
information to spread fast in the network. The averaging between two nodes in or F could be omitted 
and still the bound would not change in order. The total load on an edge e = (/i, v) between a horizontal 
moving agent and a vertical moving agent is the sum of the direct flow (/i, v)^ the the sum of the flows (/i, j) 
for all horizontal moving i and (i^h^v) for all vertical moving i. 




(54) 



The same bound holds for e = {v^h). 
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Finally, we calculate the capacity for the edges (v^h). It is sufficient to calculate a lower bound on the 
probability that agents v G Vk and h G Hi average. Agent v is selected with probability 1/n. Based on our 
assumptions on the communication radius, v can communicate with 0(logn) neighbors. The probability 
that V lands in a row within r(n) of row / is Q{y^n~^ logn) and the probability that h lands within r(n) of 
row k is also 0(\/n~^ logn). Therefore we have 



\ n V n V n log n I yn"^ J 

The capacity of each edge {v, h) is then C(e) = Vt{n~^). By symmetry, the same formulae hold for (/i, v). 
We can now calculate the overload for this flow on any edge e = {v^h): 

Since this holds for all edges we have p{F) = 0{n). The maximum length of any path used in the flow is 2, 
so by the Poincare inequality we have 

Treieo^m = Y3^^ = PimF) = 0{n) . (57) 

Theorem [T] gives the result. ■ 

One intuition for this result is that bidirectional mobility enables the construction of "short" routes 
between all pairs of agents. We can derive the identical result for the torus using the same arguments. 
Under bidirectional mobility the averaging time for the torus is 0(n log e~^), which is the same as full 
mobility. 

5.2.2 Unidirectional mobility 

We now prove an analogous lower bound to the unidirectional mobility model for the torus that shows 
unidirectional mobility does not improve the scaling performance for random geometric graphs. 

Corollary 5 (Random geometric graph with ID mobility). Consider gossip on the random geometric graph 
with n agents with the ID unidirectional mobility model. Then for this gossip algorithm, 

T^^^{n,e)=^l('^-^^\ . (58) 



Proof. We first divide the unit square into sub-squares of side length ciy for some constant ci. This 

creates a 9 (^^J^^^ x B (^^J^^^ torus on which the mobility can be defined. We must ffist characterize 
the Markov chain corresponding to the gossip algorithm under the ID unidirectional mobility model. If we 
set the communication radius to C2y^^ then an agent in the i-th row of sub-squares can communicate 
with agents in rows {z — C3, . . . , i + C3}, where C3 is again a constant. Moreover, each sub-square will have 
9 (logn) agents with high probability. Therefore we can upper bound the probability that an agent in row i 
will average with an agent in one of the rows {z — C3, . . . , z — 1, z + 1, . . . , i + C3}: 



ft. = o -M/^XTirr • (59) 




Thus the chance a given agent averages with someone not in their row of sub-squares is 0{1/ logn). 
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As in the torus, we apply the induced chain method using the partition that merges each row of sub- 
squares. This creates a new Markov chain with ^Jnj logn states that is a kind of cycle where there are 
positive transition probabilities from state k (corresponding to the /c-th row) to states / G {i — C3, . . . , i + C3}. 
From the analysis of the torus we can see that from row k to I: 

Wm = ^^ E E TT^W,, (60) 

Z.i:F(i)=fc ^^ i:Fii)^kj:Fij)=l 



-nlogn- - -O ( \ z ) (61) 

log n n \ n-^/ ^ y log ^ / 

= o(i) . (62) 

Let /3 denote this transition probability. The matrix of this new chain is still circulant and generated by the 
vector 

(/3,...,/?,l-2c3/3,/?,...,/?,0...0) . (63) 
The DFT and Taylor expansion again gives the bound on the second-largest eigenvalue: 

A2(H^) = l-/3-0(^i^^ (64) 
Therefore Treiax(W') = ^(^V logn). ■ 



6 Experiments and simulations 

We can gain some intuition about the benefits of mobility via simulations. All simulations are for a torus 
with a linearly varying field. Our first main result was a lower bound that stated that horizontal mobility 
was as bad as no mobility in terms of convergence. This is illustrated in Figure [H where we can see that for 
a range of network sizes the error under horizontal mobility is close to that of the torus with no mobility. 
Indeed, as the network size gets larger, the gap vanishes, which suggests that our analysis is tight for this 
example. Our second major result was a positive one; the bidirectional mobility model was nearly as good 
as full mobility. This is illustrated in Figure [H Although there is a gap between the error decay under the 
two mobility models, for a fixed error the number of iterations needed for to achieve that error is at most a 
constant factor more for the bidirectional mobility model. 

Our final result was that adding m mobile agents to a static grid with n agents gives a convergence 
time of 0(n^/m loge~^). Figure [6] shows how adding only a few additional mobile agents can dramatically 
improve the speed of convergence. As we add more nodes, loge decreases linearly, which corresponds to 
an exponential decay in the average error. This suggests that even in large networks investing in a small 
number of mobile agents can yield a major benefit in convergence time. 

Finally, we can simulate gossip with a different random walk mobility model [22,23] that will slow the 
convergence time. At each time, we assume the agents move according to a random walk in their row. 
Figure [71 shows how the error behaves as a function of the number of steps between each gossip iteration. 
The dashed line indicates the error under the horizontal mobility model, which corresponds to the nodes 
moving according to the stationary distribution of the random walk. Although the random walk seems 
difficult to handle analytically, these simulations indicate that our bounds may hold for these models as well. 
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Error vs.rounds for horizontal and no mobility on the torus 




0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

Number of iterations xio'* 



Figure 4: Log average error versus number of iterations of the gossip algorithm for the torus with no mobihty 
and with horizontal mobility. As the graph size increases, the gap between the two algorithms vanishes. 



Error vs.rounds for bidirectional and full mobility on the torus 




n = 100 



-12 I ' ' ' ' ' ' ' ' ' 1 

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

Number of iterations x i o" 

Figure 5: Log average error versus number of iterations of the gossip algorithm for the torus with full 
mobility and with bidirectional mobility. As the graph size increases, the gap between the two algorithms 
shrinks. 
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Error vs.additional mobile nodes for torus, n=400 
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Figure 6: Adding a few mobile nodes to a static grid can exponentially decrease the estimation error for a 
fixed number of iterations (20000). 



Error vs.random walk steps for torus, n=225 

-1.3 I , , , I I 




10 20 30 40 50 60 

Random walk steps per iteration 



Figure 7: Error in the random walk model is lower bounded by the fast-mixing horizontal mobility model. 
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7 Discussion and future directions 



In this work we investigated how agent mobihty impacts the convergence speed of distributed averaging 
algorithms by developing new analytical tools derived from the theory of Markov chains. Using these tools 
we could show that different mobility patterns can have dramatically different effects depending on the 
overlap of the mobility paths. Perhaps surprisingly, even a sublinear number of mobile nodes can change 
the order of gossip messages required for convergence. We note that "mobility" in our model is a kind of 
time- varying network topology and in practical implementations need not come from the physical mobility 
of the agents. 

The class of mobility models which are amenable to our analysis makes a strong assumption on the 
speed of the mobility or delay-tolerance of the gossip algorithm. One interesting direction for future research 
involves understanding the benefits of mobility for more realistic mobility models. It is possible that general 
mobility models with memory are tractable to analysis if the mobility is driven by a Markov chain since this 
would integrate naturally with the Markov structure of the averaging process. We conjecture that random 
walk models with slower mixing times will yield smaller benefits, and that our independent (fast mixing) 
model is always an upper bound. For these models, modifying the pairwise gossip paradigm (c.f. [17]) 
may yield a greater benefit then relying on mobility alone. The impact of node mobility on distributed 
optimization and general message-passing algorithms on probabilistic graphical models would also be a very 
interesting research direction. 

Another interesting direction is understanding the impact of mobility for more general message-passing 
algorithms for example for optimizing convex functions. The analysis of [31] obtains a convergence theorem 
similar to the spectral gap and it would be interesting to investigate the scaling behavior of the number 
of required iterations for the min-sum algorithm to optimize a convex function under some node mobility 
model. 

.1 Relaxation time for the torus plus central node 

We will construct a ^ in ([22]) to show that the mixing time of a torus plus an additional central node M with 
transition probabilities Q{{m / n) / {m ^ n)) to M and 0((l/n)/(m + n)) away from M along with transitions 
0(l/n) between neighbors in the torus has relaxation time Q{n'^/m). The stationary distribution for this 
chain has probability 0(l/(m + n)) on the nodes of the torus and 0(m/(m + n) on M. Let g{M) = and g be 
constant on each column of the torus with the values on the columns being {—a, — a+1, . . . 0, 1, 2, . . . , a, a, a— 
1, . . . , —a + 1, —a}, where a = Q{^/n). Then clearly ^7Tig{i) = 0. We can calculate the numerator and 
denominator in ([22]) : 




(66) 




(67) 



Dividing gives the result. 
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